Overview

Dataset Statistics

Number of Variables 9
Number of Rows 100000
Missing Cells 0
Missing Cells (%) 0.0%
Duplicate Rows 3854
Duplicate Rows (%) 3.9%
Total Size in Memory 17.3 MB
Average Row Size in Memory 181.5 B
Variable Types
  • Categorical: 5
  • Numerical: 4

Dataset Insights

bmi is skewed Skewed
HbA1c_level is skewed Skewed
blood_glucose_level is skewed Skewed
Dataset has 3854 (3.85%) duplicate rows Duplicates
hypertension has constant length 1 Constant Length
heart_disease has constant length 1 Constant Length
diabetes has constant length 1 Constant Length

Variables


gender

categorical

Approximate Distinct Count 3
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 6.7 MB

Length

Mean 5.1712
Standard Deviation 0.9851
Median 6
Minimum 4
Maximum 6

Sample

1st row Female
2nd row Female
3rd row Male
4th row Female
5th row Male

Letter

Count 517122
Lowercase Letter 417122
Space Separator 0
Uppercase Letter 100000
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (Female, Male) take over 50.0%

age

numerical

Approximate Distinct Count 102
Approximate Unique (%) 0.1%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 1.5 MB
Mean 41.8859
Minimum 0.08
Maximum 80
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • age is skewed left (γ1 = -0.052)

Quantile Statistics

Minimum 0.08
5-th Percentile 4
Q1 24
Median 43
Q3 60
95-th Percentile 80
Maximum 80
Range 79.92
IQR 36

Descriptive Statistics

Mean 41.8859
Standard Deviation 22.5168
Variance 507.0081
Sum 4.1886e+06
Skewness -0.05198
Kurtosis -1.0038
Coefficient of Variation 0.5376
  • age is not normally distributed (p-value 6.304042356162067e-08)

hypertension

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 6.3 MB
  • The largest value (0) is over 12.36 times larger than the second largest value (1)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 0
2nd row 0
3rd row 0
4th row 0
5th row 1

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 100000
  • The top 2 categories (0, 1) take over 50.0%
  • The largest value (0) is over 12.36 times larger than the second largest value (1)
  • hypertension has words of constant length

heart_disease

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 6.3 MB
  • The largest value (0) is over 24.37 times larger than the second largest value (1)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 1
2nd row 0
3rd row 0
4th row 0
5th row 1

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 100000
  • The top 2 categories (0, 1) take over 50.0%
  • The largest value (0) is over 24.37 times larger than the second largest value (1)
  • heart_disease has words of constant length

smoking_history

categorical

Approximate Distinct Count 6
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 6.8 MB

Length

Mean 6.3423
Standard Deviation 1.5674
Median 7
Minimum 4
Maximum 11

Sample

1st row never
2nd row No Info
3rd row never
4th row current
5th row current

Letter

Count 591971
Lowercase Letter 520339
Space Separator 42263
Uppercase Letter 71632
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (No Info, never) take over 50.0%

bmi

numerical

Approximate Distinct Count 4247
Approximate Unique (%) 4.2%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 1.5 MB
Mean 27.3208
Minimum 10.01
Maximum 95.69
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • bmi is skewed right (γ1 = 1.0438)

Quantile Statistics

Minimum 10.01
5-th Percentile 16.82
Q1 23.63
Median 27.32
Q3 29.58
95-th Percentile 39.49
Maximum 95.69
Range 85.68
IQR 5.95

Descriptive Statistics

Mean 27.3208
Standard Deviation 6.6368
Variance 44.0469
Sum 2.7321e+06
Skewness 1.0438
Kurtosis 3.5205
Coefficient of Variation 0.2429
  • bmi is not normally distributed (p-value 3.288891360785988e-20)
  • bmi has 7086 outliers

HbA1c_level

numerical

Approximate Distinct Count 18
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 1.5 MB
Mean 5.5275
Minimum 3.5
Maximum 9
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • HbA1c_level is skewed left (γ1 = -0.0669)

Quantile Statistics

Minimum 3.5
5-th Percentile 3.5
Q1 4.8
Median 5.8
Q3 6.2
95-th Percentile 6.6
Maximum 9
Range 5.5
IQR 1.4

Descriptive Statistics

Mean 5.5275
Standard Deviation 1.0707
Variance 1.1463
Sum 552750.7
Skewness -0.06685
Kurtosis 0.2153
Coefficient of Variation 0.1937
  • HbA1c_level is not normally distributed (p-value 7.763221218548164e-07)
  • HbA1c_level has 1315 outliers

blood_glucose_level

numerical

Approximate Distinct Count 18
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 1.5 MB
Mean 138.0581
Minimum 80
Maximum 300
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • blood_glucose_level is skewed right (γ1 = 0.8216)

Quantile Statistics

Minimum 80
5-th Percentile 80
Q1 100
Median 140
Q3 159
95-th Percentile 200
Maximum 300
Range 220
IQR 59

Descriptive Statistics

Mean 138.0581
Standard Deviation 40.7081
Variance 1657.1523
Sum 1.3806e+07
Skewness 0.8216
Kurtosis 1.7375
Coefficient of Variation 0.2949
  • blood_glucose_level is not normally distributed (p-value 5.5601775576720505e-12)
  • blood_glucose_level has 2038 outliers

diabetes

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 6.3 MB
  • The largest value (0) is over 10.76 times larger than the second largest value (1)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 0
2nd row 0
3rd row 0
4th row 0
5th row 0

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 100000
  • The top 2 categories (0, 1) take over 50.0%
  • The largest value (0) is over 10.76 times larger than the second largest value (1)
  • diabetes has words of constant length

Interactions

Correlations

Missing Values